Maximizing Data Locality in Hadoop Clusters via Controlled Reduce Task Scheduling
ثبت نشده
چکیده
The overall goal of this project is to gain a hands-on experience with working on a large open-ended research-oriented project using the Hadoop framework. Hadoop is an open source implementation of MapReduce and Google File System, and is currently enjoying wide popularity. Students will modify the task scheduler of Hadoop, conduct several experimental studies, and analyze performance and network traffic results.
منابع مشابه
Scheduling algorithm based on prefetching in MapReduce clusters
Due to cluster resource competition and task scheduling policy, some map tasks are assigned to nodes without input data, which causes significant data access delay. Data locality is becoming one of the most critical factors to affect performance of MapReduce clusters. As machines in MapReduce clusters have large memory capacities, which are often underutilized, in-memory prefetching input data ...
متن کاملHadoop Scheduling Base On Data Locality
In hadoop, the job scheduling is an independent module, users can design their own job scheduler based on their actual application requirements, thereby meet their specific business needs. Currently, hadoop has three schedulers: FIFO, computing capacity scheduling and fair scheduling policy, all of them are take task allocation strategy that considerate data locality simply. They neither suppor...
متن کاملShareability and Locality Aware Scheduling Algorithm in Hadoop for Mobile Cloud Computing
Using different scheduling algorithms can affect the performance of mobile cloud computing using Hadoop MapReduce framework. In Hadoop MapReduce framework, the default scheduling algorithm is First-In-First-Out (FIFO). However, the FIFO scheduler simply schedules task according to its arrival time and does not consider any other factors that may have great impact on system performance. As a res...
متن کاملData-Replicas Scheduler for Heterogeneous MapReduce Cluster
Large scale data processing has rapidly increased in nowadays. MapReduce programming model, which is firstly mentioned in functional languages, appeared in distributed system and perform excellently in large scale data processing since 2006. Hadoop, which is the most popular framework of open-sourced MapReduce runtime environment, supplies reliable, scalable and distributed system processing la...
متن کاملPredoop: Preempting Reduce Task for Job Execution Accelerations
Map/Reduce is a popular parallel processing framework for data intensive computing. For overlapping the Map task’s execution phase and the Reduce task’s intermediate data fetching and merging phase, existing Map/Reduce schedulers always pre-launch the Reduce task at the specific threshold where its map tasks have been launched, and this pattern incurs the occupation of the consuming resources o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011